## ECE 4300 Homework #1

**1.3** Describe the steps that transform a program written in a high-level language such as C into a representation that is directly executed by a computer processor.

At first the code gets written using a high-level language, C. Then the compilers translate this program into instructions that the hardware can execute. Assemblers then translate a symbolic version, assembly language, of an instruction into its binary version.

- **1.4** Assume a color display using 8 bits for each of the primary colors (red, green, blue) per pixel and a frame size of 1280 x 1024.
- **a.** What is the minimum size in bytes of the frame buffer to store a frame?

8 bits for 3 colors:  $8 \times 3 = 24$  bits -> 24 bits / 8 = 3 bytes Total number of pixels:  $1280 \times 1024 = 1310720$  pixels

Minimum size of bytes: 1310720 pixels x 3 bytes = 3932160 bytes

b. How long would it take, at minimum, for the frame to be sent over a 100 Mbit/s network?

Bytes to bits: 3932160 x 8 = 31457280 bits

At minimum: 31457280 / 100 x 10<sup>6</sup> = **0.315 seconds** 

- **1.5** Consider three different processors P1, P2, and P3 executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.
- a. Which processor has the highest performance expressed in instructions per second?

P1: 3 GHz / 1.5 = (3 x 10^9) / 1.5 = 2 x 10^9 Instructions Per Second

P2: 2.5 GHz / 1.0 = (2.5 x 10^9) / 1.0 = 2.5 x 10^10 Instructions Per Second

P3: 4.0 GHz / 2.2 = (4.0 x 10^9) / 2.2 = 1.82 x 10^9 Instructions Per Second

Therefore, P2 has the highest performance

**b.** If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.

P1: (3x10^9)(10) = **30 x 10^9 Cycles** (30x10^9) / 1.5 = **20 x 10^9 Instructions** 

P2: (2.5x10<sup>9</sup>)(10) = **25 x 10<sup>9</sup> Cycles** (25x10<sup>9</sup>) / 1.0 = **25 x 10<sup>9</sup> Instructions** 

P3: (4.0x10^9)(10) = **40 x 10^9 Cycles** 

 $(40x10^9) / 2.2 = 18.18 \times 10^9$  Instructions

**c.** We are trying to reduce the execution time by 30% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction?

```
Execution time = 0.7 x Execution Time(old)

CPI = 1.2 x CPI(old)

0.7 x Execution Time(old) = (1.2)(CPI(old))

0.7 / clock rate(old) = 1.2 / clock rate

Clock rate = 1.2 / (0.7) (clock rate(old))

Clock rate = 1.71 clock rate (old)
```

**1.6** Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.

Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation is faster?

a. What is the global CPI for each implementation?

```
P1: CPI = (0.10)(1) + (0.20)(2) + (0.50)(3) + (0.20)(3) = 2.6
P2: CPI = (0.10)(2) + (0.20)(2) + (0.50)(2) + (0.20)(2) = 2.0
```

- **b.** Find the clock cycles required in both cases.
  - P1: Clock cycle = (2.6)(1.0E6) = **2.6E6 cycles**
  - P2: Clock cycle = (2.0)(1.0E6) = **2.0E6 cycles**
- **1.7** Compilers can have a profound impact on the performance of an application. Assume that for a program, compiler A results in a dynamic instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s.
- a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns.

**b.** Assume the compiled programs run on two different processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A's code versus the clock of the processor running compiler B's code?

```
Clock rate A = (1.1)(1.0E9) = 1.1E9
Clock rate B = (1.25)(1.2E9) = 1.5E9
Processor A: (1.1E9) / (1.5E9) = 0.73
```

**c.** A new compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or B on the original processor?

- **1.11** The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.
- **1.11.1** Find the CPI if the clock cycle time is 0.333ns.

**1.11.2** Find the SPECratio.

**1.11.3** Find the increase in CPU time is the number of instructions of the benchmark is increased by 10% without affecting the CPI.

**1.11.4** Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.

**1.11.5** Find the change in the SPECratio for this change.

**1.11.6** Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the

number of instructions has been reduced by 15%. The execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.

$$1-15\% = 0.85 \rightarrow (0.85)(2.389E12) = 2.03E12$$
  
CPI = (700)(4E9) / (2.03E12) = **1.38**

**1.11.7** This CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?

```
(4 GHz / 3 GHz) x 100 = 133.33
(1.38 / 0.94) x 100 = 146.81
```

The increase in CPI is not similar to the increase in the clock rate because of the change of the number of instructions between 1.11.1 and 1.11.6.

**1.11.8** By how much has the CPU time been reduced?

CPU time = 
$$750 - 700 = 50$$
 seconds

**1.11.9** For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without affecting to the CPI and with a clock rate of 4 GHz, determine the number of instructions.

```
Instructions = (960E-9)(3E9) / 1.61 = 1788.82
Reduced by 10% -> (1-10%)(960E-9) = 8.64E-7 seconds
Instructions = (8.64E-7)(4E9) / (1.61) = 2146.58
```

**1.11.10** Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.

```
(1-10\%)(960E-9) = 8.64E-7 seconds
Clock rate = (1788.82)(1.61) / (8.64E-7) = 3.33E9 Hz = 3.33 GHz
```

**1.11.11** Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.

```
CPI = (1-15%) (1.61) = 1.37

CPU = (1-20%) (960E-9) = 7.68E-7 seconds

Clock rate = (1788.82)(1.37) / (7.68E-7) = 3.19E9 Hz = 3.19 GHz
```

- **1.12** Section 1.10 cites as a pitfall the utilization of a subset of the performance equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions.
- **1.12.1** One usual fallacy is to consider the computer with the largest clock rate as having the largest performance. Check if this is true for P1 and P2.

P1: (0.9)(5.0E9) / 4E9 = **1.13 seconds** P2: (0.75)(1.0E9) / 3E9 = **0.25 seconds** 

**1.12.2** Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions.

P1: (0.9)(1.0E9) / 4E9 = 0.23 seconds P2: (0.23)(3E9) / 0.75 = **9E8 instructions** 

**1.12.3** A common fallacy is to use MIPS (millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2.

MIPS P1: 5E9 / (1.13 x 10<sup>6</sup>) = **4424.78 MIPS** MIPS P2: 1.0E9 / (0.25 x 10<sup>6</sup>) = **4000 MIPS** 

- **1.12.4** Another common performance figure is MFLOPS (millions of floating-point operations per second), defined as MFLOPS = No. FP operations / (execution time × 1E6) but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the programs.
  - MFLOPS P1: (0.4)(5E9) / (1.13 x 1E6) = 1769.91 MFLOPS
  - MFLOPS P2: (0.4)(1.0E9) / (0.25 x 10<sup>6</sup>) = 1600 MFLOPS